Cracking the absurd beauty of Nvidia GauGAN 2 AI image machine

2021-11-24 04:45:33 By : Ms. Gao Aria

Entering meaningless phrases in Nvidia's algorithms can produce some fascinating "errors", sometimes beautiful, sometimes unfortunate, and in most cases fascinating.

Author: Tiernan Ray | November 22, 2021 | Topic: Artificial Intelligence

Enter the words "ZDNet Superb Report" in Nvidia's GauGAN 2 AI program, and surrealistic images will be automatically generated.

Enter the words "ZDNet Superb Report" in Nvidia's new artificial intelligence demonstration GauGAN 2, and you will see a picture that looks like a large piece of foam insulation wrestling in a lake against a snowy background. 

Add more words, such as "ZDNet wonderful report", you will see the image has become some new, almost unrecognizable form, perhaps a digested Formula One car, along the road looks a bit like a The road, in front of the blurred vision of the man-made structure. 

GauGAN 2 produced a strange interpretation of the phrase "ZDNet wonderful report".

Roll the dice with the small buttons with the images of the two dice, and you will, the same phrase becomes a ghostly, misty landscape, a mouth with a yawn, with a certain organic nature, but it is completely unrecognizable Exact species.

Another roll of dice produced this bizarre landscape plus creature.

Entering a phrase is the latest method to control GauGAN, which is an algorithm developed by graphics chip giant Nvidia to showcase the latest technology of artificial intelligence. The original GauGAN program was launched in early 2019 as a drawing method and let the program automatically generate realistic images by filling in the drawing.

The term "GAN" in the name refers to a large class of neural network programs, called generative adversarial networks, launched by Ian Goodfellow and colleagues in 2014. GAN uses two cross-operated neural networks, one produces output, and it steadily refines until the second neural network marks the output valid. The competitive nature of the back and forth is why they are called "adversarial".

Nvidia has done pioneering work in extending GAN, including the "Style-GAN" launched in 2018, which makes it possible to generate highly realistic fake photos of people. In this work, the neural network "learns" the high-level and low-level aspects of the face, such as skin color. 

In the original GauGAN in 2019, Nvidia used a similar method to let people draw the landscape as regions, called a segmentation map. Those high-level abstractions, such as lakes, rivers and fields, become a structural template, and then the GauGAN program will fill the drawn segmentation map with the form of the real world. 

The second version of the program has been updated to handle language. The purpose is to prompt GuaGAN 2 with reasonable phrases that are related to the landscape, such as "coastal ripples and cliffs." The GauGAN 2 program will respond by generating a realistic scene that matches the input. 

Nvidia said that the program was developed in the "training" phase. It uses the Selene supercomputer built by Nvidia GPU and inputs 10 million high-quality landscape images.

Segmentation maps can also be created automatically, allowing people to go back and edit the landscape layout in the way the original GauGAN allowed it to be created. 

Just as Nvidia described GauGAN 2 in a blog post, the combination of text images and segmentation maps is a breakthrough in multimodal AI:

GauGAN2 combines segmentation mapping, repair, and text-to-image generation in a single model, making it a powerful tool for creating realistic art of mixed text and drawing. This demo is one of the first to combine multiple modes (text, semantic segmentation, sketch, and style) within a single GAN frame. This makes it faster and easier to transform the artist's vision into high-quality AI-generated images.

Nvidia said that the actual benefit is that people can use a few words to combine a basic image without any drawing at all, and then adjust the details to optimize the final output.

But adding words that have nothing to do with scenery, such as "ZDNet", starts to produce crazy artifacts, which are sometimes offensive and weird, sometimes shocking-it depends on your taste. In the terminology of deep learning, weird images produced by meaningless phrases are due to the fact that the program has to deal with "undistributed" language, which means it is not captured in the training data provided to the machine. Faced with irreconcilable phrases, the program is working hard to match images with phrases.

As can be seen from a series of images, the “coastal rippling cliffs” initially produced very faithful images. Adding qualifiers with offensive words — the name bicycle, New York City, Cassandra — began to change and shape the landscape in strange ways. 

GauGAN2 automatically outputs the phrase "coastal ripples and cliffs".

The phrase "Coastal Ripples Cliff Bike New York Cassandra Drill Aircraft Wisely Pneumatically Shows Off" automatically output by GauGAN2.

When all the scenery words were deleted, leaving only nonsense, something more interesting happened. Strange, futuristic landscapes or multi-colored amoeba appear in front of you.

GauGAN2 is automatically output for the phrase "Cassandra drill plane wisely shows off pneumatically".

GauGAN2 automatically outputs the word "show off".

GauGAN2 automatically outputs the word "ostentatious"

GauGAN2 is automatically output for the phrase "pneumatically show off wisely".

GauGAN2 is automatically output for the phrase "pneumatically show off wisely".

Use suggestive but not fully descriptive extended phrases to further experiment. Try to feed in the first line of TS Eliot's poem "The Waste Land", "April is the cruelest month. Lilacs are grown from the dead."

The result is some eye-catching images, which are actually somewhat appropriate. When you roll the dice, there will be many variations of the proper landscape, in some cases only minor artifacts.

"April is the cruelest month to grow lilacs from the dead," TS Eliot, Wasteland.

Thanks to StyleGAN's innovation, GauGAN can apply styles to images, essentially adjusting the output to the form of other images, rather than mashups. 

The stylistic application of Eliot's poems distorted the faithful landscape image beyond recognition. Once again, a large number of strange objects appeared, some of them with a disgusting organic quality, while others were just fragments of what was once an image.

You can also submit images and even draw on GauGAN 2. Submitting an old photo taken at Þingvellir, the site of the ancient Icelandic parliament, does not help much. In a limited test, the image remained mostly unchanged.

A photo taken in Thingvellir, the seat of the ancient Icelandic parliament, was almost unchanged when submitted to GauGAN2.

However, the addition of the word "Þingvellir" produced a sufficiently realistic landscape that is consistent with the Þingvellir website.

The word "Þingvellir" output by GuaGAN2 conforms to the spirit of the ancient Icelandic landscape.

Adding the word "volcano" produces a striking alternative landscape, less realistic and more surreal.

GuaGAN2 automatically outputs "Þingvellir volcano".

Adding an offensive word, such as "technology", further shakes the landscape and adds strange nonsense numbers. 

GauGAN2 automatically outputs the phrase "Þingvellir technology".

As in the original GauGAN, you can draw instead of submitting landscape photos. Once again, choosing something that does not fit the presentation, not a landscape but a picture of a human head, will produce more interesting results. If you want, you can use the mash-up function to re-skin your face. Rolling the dice will make interesting changes.

Draw the re-skinned head using the layer function in GauGAN2.

Draw the re-skinned head using the layer function in GauGAN2.

 Combining the drawing with the word "Þingvellir" produced subtle changes and added additional words such as "volcano" and "rift valley". The image is re-skinned to have a volcanic texture. 

The drawing of the head is combined with the words "Þingvellir volcanic rift" and re-skinned using the layer function in GauGAN2.

Please note that the user interface of the application may be difficult to scroll in a desktop browser. For some reason, it seems to work better in tablet browsers (such as iPad).

The bot that won during the e-commerce holiday crunch

Baidu and Swiss Re partner to explore self-driving car insurance

Microsoft now has one of the fastest supercomputers in the world (no, it does not run on Windows)

Why conversational AI is now ready for prime time

Nvidia CEO: Supply chain chaos "sounds the alarm for everyone"

Google launches Bot-in-a-Box to promote conversational artificial intelligence

Artificial Intelligence: Everyone wants it, but not everyone is ready

The Indian Water Safety Platform won the IBM Call for Code Challenge Championship

Workplace monitoring is everywhere. This is the way to stop algorithms from ruling your office

Please review our terms of service to complete your newsletter subscription.

You agree to receive updates, promotions and reminders from ZDNet.com. You can unsubscribe at any time. By joining ZDNet, you agree to our terms of use and privacy policy.

You agree to receive updates, promotions and reminders from ZDNet.com. You can unsubscribe at any time. By signing up, you agree to receive selected newsletters that you can unsubscribe at any time. You also agree to the terms of use and acknowledge the data collection and use practices outlined in our privacy policy.

© 2021 ZDNET, a red venture capital company. all rights reserved. Privacy Policy| Cookie Settings| Advertising| Terms of Use